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ABSTRACT 

In  completable  planning,  a  planning  system  is  given  the  ability  to  defer  goals 
which  it  can  prove  to  be  achievable.  This  has  the  advantages  of  allowing  the  utiliza¬ 
tion  of  runtime  information  in  planning  and  enabling  a  planner  to  use  less  precise  a 
priori  information  without  sacrificing  guarantees  of  success.  In  this  paper,  we  extend 
completable  planning  to  goals  which  are  only  probably  achievable,  thus  extending 
its  scope  to  a  wider  variety  of  problems.  We  also  define  completable  plans  in  terms 
of  its  constituent  reactive  plan  components,  conditionals  and  repeat-loops,  which 
achieve  the  deferred  goals,  and  we  discuss  the  costs  incurred  by  completable  plan¬ 
ning  in  terms  of  runtime  evaluation  cost,  plan  flexibility,  a  priori  planning  cost,  and 
guarantees  of  success.  In  extending  completable  planning  to  probable  achievability, 
we  also  introduce  incremental  explanation-based  learning  strategies  for  learning 
probably  completable  conditionals  and  probably  completable  repeat-loops,  and 
demonstrate  the  learning  of  a  probably  completable  plan  in  a  simple  train  route-plan¬ 
ning  example. 
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INTRODUCTION 

Recent  years  have  seen  a  growing  interest  in  planning  systems  which  have  both 
a  priori  planning  capabilities  for  providing  goal-dire  ctedness,  and  reactive  runtime 
capabilities  for  providing  flexibility  and  sensitivity  to  the  runtime  environment 
[Drummond90,  Gervasio90,  Kaelbling88,  Mitchell90,  Tumey89],  in  response  im- 
practicality  of  classical  planning  [Chapman87,  Sacerdoti77,  Sussman73],  which 
constructs  complete  provably-conect  plans  but  requires  complete  and  correct  a  pri¬ 
ori  information  to  do  so — an  unrealistic  demand  in  many  real  world  domains.  These 
same  domains,  however,  although  uncertain  in  some  aspects,  also  often  follow  par¬ 
ticular  predictable  patterns  of  behavior,  and  thus  some  a  priori  planning  is  both  possi¬ 
ble  and  desirable,  in  contrast  to  the  unpredictable,  dynamic  environments  addressed 
by  situated  action  [Agre87,  Suchman87]. 

In  [Gervasio90],  we  presented  an  integrated  planning  approach  wherein  a  classi¬ 
cal  planner  is  augmented  with  the  ability  to  defer  achievable  goals,  where  achievabil- 
ity  is  simply  defined  as  the  existence  of  a  plan  which  would  achieve  the  goal  during 
execution.  If  a  planner  could  prove  the  existence  of  such  a  plan — not  necessarily  by 
determining  the  plan  itself — the  goal  could  be  deferred  until  execution,  when  addi¬ 
tional  information  could  become  available  to  make  better-informed  planning  deci¬ 
sions.  Furthermore,  a  system  minimizes  its  a  priori  information  requirements  by  be¬ 
ing  able  to  use  less  precise  information.  Also,  because  the  deferred  goals  have 
achievability  proofs,  a  system  can  still  construct  provably-coirect  plans.  We  also 
presented  contingent  explanation-based  learning,  a  strategy  for  learning  general 
completable  reactive  plans,  which  introduced  the  idea  of  conjectured  variables  to 
distinguish  between  a  priori  and  runtime  planning  decisions. 

A  limitation  of  this  original  approach  to  completable  planning  was  the  require¬ 
ment  of  absolute  achievability.  Consider  the  problem  of  hammering  a  nail  into  the 
wall,  where  a  pound  action  will  often  result  in  driving  the  nail  further  into  the  wall, 
but  may  sometimes  end  up  getting  the  nail  bent  instead.  A  completable  planner  re¬ 
quiring  absolute  achievability  would  not  be  able  to  solve  such  a  problem,  because 
for  the  same  knowledge  limitations  reason  that  it  cannot  determine  a  priori  the  pre¬ 
cise  number  of  pounding  action  to  use,  it  cannot  guarantee  that  the  nail  will  not  be 
bent  by  a  pounding  action.  However,  a  completable  planner  which  can  reason  about 
actions  with  different  possible  outcomes,  and  construct  or  learn  alternative  plans,  can 
construct  probably  completable  plans  which  have  a  good  chance  of  success. 

In  this  paper,  we  extend  the  completable  planning  approach  to  probable  achiev¬ 
ability.  Introducing  probable  achievability  not  only  extends  the  scope  of  complet¬ 
able  planning  to  a  wider  variety  of  planning  problems,  but  also  opens  up  new  oppor¬ 
tunities  to  investigate  the  learning  of  completable  plans.  We  begin  by  formalizing 
the  idea  of  completable  planning  by  defining  the  various  components  of  a  complet¬ 
able  plan  and  the  achievability  constraints  on  these  different  components  and  dis¬ 
cussing  some  of  the  cost  tradeoffs  involved  in  completable  planning.  We  then  pres¬ 
ent  incremental  learning  strategies  for  learning  probably  completable  plans  and 
demonstrate  its  use  in  learning  an  increasingly  completable  plan  for  determining 
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travel  routes  in  a  simple  train  domain.  Finally,  we  discuss  some  related  and  future 
work. 


COMPUTABLE  PLANS 

In  computable  planning,  a  planner  may  decide  upon  certain  actions  a  priori  and 
use  runtime  information  gathered  through  its  sensors  to  decide  upon  other  actions 
during  execution.  All  projection  is  done  prior  to  execution,  and  runtime  decision¬ 
making  is  limited  to  being  reactive — i.e.  during  execution,  the  system  decides  on  its 
next  action  on  a  predetermined  basis,  represented  by  conditionals  and  repeat-loops.1 
Uncertainty  is  characterized  by  using  state  descriptions  which  correspond  to  a  set  of 
states  rather  than  unique,  single  states.  Let  a  state  description  S  be  a  conjunction  of 
atomic  sentences,  and  states(S)  represent  the  set  of  all  states  in  which  S  is  satisfied. 
For  a  plan  component  p  in  a  plan  P,  let  PREC(p)  be  the  state  description  resulting 
from  regressing  the  god  back  through  all  later  plan  components  in  P  to  p.  Similarly, 
let  EFF(p)  be  the  result  of  projecting  the  initial  state  through  all  earlier  plan  compo¬ 
nents  in  P  to  p. 

Given  an  initial  state  description  I  and  goal  state  description  G,  a  provably-cor- 
rect  plan  P  for  I  and  G  is  an  ordered  sequence  of  plan  components  of  the  form: 
<Pi;P2;-;pn>.  constrained  in  the  following  manner 
states(I)  £  states(PREC(pi)) 

For  pi  e  <pi,P2,...,Pn-i>.  states(EFF(pi))  £  states(PREC(pi+i)) 
states(EFF(pn))  £  states(G). 

This  is  shown  graphically  inFigure  1. 

states(PREC(pl))  statcs(PREC(p2))  states  (PREC(p3))  stale  s(G) 

IS) — 3jE> — 

states(I)  states(EFF(pl))  states(EFF(p2))  states(EFF(pn» 

Figure  1.  A  plan  consisting  of  unconditional  actions. 

There  are  three  types  of  plan  components :  unconditional  actions,  conditionals, 
and  repeat  loops.  Unconditional  actions  are  actions  to  be  executed  without  environ¬ 
mental  input  Classical  planners  [Chapman87,  Sacerdoti77,  Sussman73]  can  be 
characterized  as  constructing  plans  consisting  solely  of  unconditional  actions,  and 
thus  unconditional  actions  can  be  said  to  constitute  the  classical  part  of  a  completable 
plan  while  conditionals  and  repeat-loops,  the  reactive  part.  In  completable  planning, 
the  deferred  goals  addressed  by  the  reactive  components  must  be  achievable,  and 
thus  constraints  must  be  placed  on  conditionals  and  repeat-loops  to  guarantee  tneir 
achievement  of  the  preconditions  of  succeeding  actions. 


1 .  The  decision  to  allow  no  further  planning  during  execution  is  not  a  theoretical  claim.  The  primary  focus 
of  this  research  is  on  learning,  and  as  such,  the  system  currently  has  simple  planning  capabilities.  However, 
other  planning  capabilities  may  be  added  as  the  research  progresses  and  new  learning  avenues  are  explored. 
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Probably  Completable  Conditionals 

Conditionals  deals  with  the  problem  of  over-general  initial  state  descriptions. 
Prior  to  execution,  all  a  planner  may  know  is  that  at  a  particular  point  in  the  execution 
of  a  plan,  it  will  be  in  one  of  several  possible  states  satisfying  some  description.  How¬ 
ever  the  different  states  satisfying  this  description  may  require  different  actions  to 
achieve  the  preconditions  for  succeeding  actions.  For  example,  in  planning  to  get 
to  some  higher  floor  in  a  new  building,  after  going  through  the  front  door,  all  you  may 
know  is  that  you  will  be  in  the  lobby.  However,  depending  upon  various  factors  such 
as  whether  there  will  be  a  staircase  or  an  elevator  or  both,  your  proximity  to  each  one, 
which  floor  you  wish  to  go  to,  and  the  functionality  of  the  elevator,  you  would  like 
to  take  different  actions.  Through  conditionals,  decisions  can  be  made  during  execu¬ 
tion  regarding  appropriate  actions,  using  any  additional  information  which  becomes 
available  at  that  point  in  execution. 

A  conditional  is  of  the  form:  {COND  ci  — >  qi;  C2  — >  q2;  ...;  cn  — >  qn>  where  each 
Ci  — >  qi  is  an  action-decision  ruU  which  represents  the  decision  to  execute  the  plan 
qi  when  the  test  Ci  yields  true.  Like  the  situation-action  type  rules  used  in  reactive 
systems  such  as  [Kaelbling88,  Mitchell90,  Schoppers87],  action-decision  rules  map 
different  situations  into  different  actions,  allowing  a  system  to  make  decisions  based 
on  its  current  environment 

In  a  completable  plan,  however,  a  conditional  pi  =  {COND  ci  -»  qi;  C2  ->  qr, 
...;  c„  — >  qn}  must  also  satisfy  the  following  constraints  for  achievability: 

1.  Exhaustiveness:  states(ciAC2A...ACn)  must  be  an  exhaustive  subset  of 
states(EFF(pi_1)) 

2.  Observability:  each  q  must  consist  of  observable  conditions,  where  an 
observable  condition  is  one  for  which  there  exists  a  sensor  which  can  verify 
the  truth  or  falsity  of  the  condition. 

3.  Achievement,  for  each  qi,  states(EFF(qi))  c  states(PREC(pi+i)). 

This  is  shown  graphically  in  Figure  2.  For  probably  completable  plans,  the  exhaus- 


states(EFF(pi_i))  statcs(ci)  states(EFF(qi))  states(PREC(p(+1)) 


states  (EFF(q2)) 
states  (Ehh(q3)) 

Figure  2.  A  completable  conditional  pi  with  three  action-decision  rules. 


states(c2>' 

states(c3) 


tiveness  constraint  is  relaxed  to  require  only  probable  exhaustiveness,  and  the  greater 
the  coverage,  the  greater  the  conditional’s  chance  of  achieving  PREC(pj+i).  The  ob¬ 
servability  constraint  requires  knowledge  of  sensory  capability,  and  here  we  use  the 
term  sensor  in  the  broader  sense  of  some  set  of  sensory  actions,  which  we  will  assume 
the  system  knows  how  to  execute  to  verify  the  associated  condition.  It  is  needed  to 
ensure  that  the  conditional  can  be  successfully  evaluated  during  execution.  Finally, 
the  achievement  constraint  ensures  that  there  the  actions  taken  in  the  conditional 
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achieve  the  preconditions  of  the  succeeding  plan  component.  Provided  these  three 
constraints  are  satisfied,  the  conditional  is  considered  probably  completable,  and  the 
goal  PREC(pi+i)  of  the  conditional  is  probably  achievable. 

Probably  Completable  Repeat-Loops 

A  repeat-loop  is  of  the  form:  {REPEAT  q  UNTIL  c>,  which  represents  the  deci¬ 
sion  to  execute  the  plan  q  until  the  test  c  yields  true.  Repeat  loops  are  similar  in  idea 
to  servo-mechanisms;  but  in  addition  to  the  simple  yet  powerful  failure-recovery 
strategy  such  mechanisms  provide,  repeat  loops  also  permit  the  construction  of  re¬ 
peated  action  sequences  achieving  incremental  progress  towards  the  goal,  which 
may  be  viewed  as  a  reactive,  runtime  method  of  achieving  generalization-to-N  [Co- 
hen88,  Shavlik87].  Repeat  loops  are  thus  useful  in  completable  plans  for  mainly  two 
reasons:  simple  failure  recovery  and  iterations  for  incremental  progress. 

Repeat-loops  for  simple  failure-recovery  are  useful  with  actions  having  nonde- 
terministic  effects,  which  arise  from  knowledge  limitations  preventing  a  planner 
from  knowing  which  of  several  possible  effects  a  particular  action  will  have.  For  ex¬ 
ample,  in  attempting  to  unlock  the  door  to  your  apartment,  driving  the  key  to  the  key¬ 
hole  will  most  often  result  in  the  key  lodging  into  the  hole.  However,  once  in  a  while, 
the  key  may  end  up  jamming  beside  the  hole  instead;  but  repeating  the  procedure  of¬ 
ten  achieves  the  missed  goal.  In  completable  planning,  if  an  action  has  several  possi¬ 
ble  outcomes,  and  if  the  successful  outcome  is  highly  probable,  and  if  the  unsucces¬ 
sful  ones  do  not  prevent  the  eventual  achievement  of  the  goal,  then  a  repeat-loop  can 
be  used  to  ensure  the  achievement  of  the  desired  effects. 

A  repeat-loop  p  =  {REPEAT  q  until  c  >  for  failure-recovery  must  satisfy  the  fol¬ 
lowing  constraints  for  achievability: 

1.  Observability,  c  must  be  an  observable  condition 

2.  Achievement,  c  must  be  a  probable  effect  of  q 

3.  Repeatability,  the  execution  of  q  must  not  irrecoverably  deny  the 
preconditions  of  q  until  c  is  achieved. 

This  is  shown  graphically  in  Figure  3  a.  The  observability  constraint  is  needed,  once 
again,  to  be  able  to  guarantee  successful  evaluation,  while  the  achievement  and  re¬ 
peatability  constraints  together  ensure  a  high  probability  of  eventually  exiting  the  re¬ 
peat  loop  with  success.  As  with  the  exhaustiveness  constraint  for  conditionals,  the 
repeatability  constraint  may  be  relaxed  so  that  the  execution  of  q  need  only  probably 
preserve  or  probably  allow  the  reachievement  of  the  preconditions  of  q. 

Repeat-loops  for  incremental  progress  deal  with  over-general  effect  state  de¬ 
scription.  Once  again,  knowledge  limitations  may  result  in  a  planner  not  having  pre¬ 
cise  enough  information  to  make  action  decisions  a  priori.  In  actions  which  result 
in  changing  the  value  of  a  quantity,  for  example,  your  knowledge  may  be  limited  to 
the  direction  of  change  or  to  a  range  of  possible  new  values,  which  may  not  be  specif¬ 
ic  enough  to  permit  making  decisions  regarding  precise  actions — for  example,  deter¬ 
mining  the  precise  number  of  action  repetitions  or  the  precise  length  of  time  over 
which  to  run  a  process  in  order  to  achieve  the  goal.  The  implicit  determination  of 
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such  values  during  execution  is  achieved  in  computable  planning  through  the  use 
of  repeat-loops  which  achieve  incremental  progress  towards  the  goal  and  use  run¬ 
time  information  to  determine  when  the  goal  has  been  reached. 

A  repeat-loop  p  =  {REPEAT  c  until  p>  for  incremental  progress  must  satisfy  the 
following  constraints  for  achievability: 

1.  Continuous  observability,  c  must  be  an  observable  condition  which  checks 
a  particular  parameter  for  equality  to  a  member  of  an  ordered  set  of  values — 
for  example,  a  value  within  the  range  of  acceptable  values  for  a  quantity. 

2.  Incremental  achievement,  each  execution  of  q  must  result  in  incremental 
progress  towards  and  eventually  achieving  c — i.e.  it  must  reduce  the 
difference  between  the  previous  parameter  value  and  the  desired  parameter 
value  by  at  least  some  finite  non-infinitesimal  e. 

3.  Repeatability :  the  execution  of  q  must  not  irrecoverably  deny  the 
preconditions  of  q  until  c  is  achieved. 

This  is  shown  graphically  in  Figure  3b.  The  continuous  observability  constraint  en- 


probable  successful  outcome  siaics^c) 

a.  Failure  recovery.  b.  Incremental  Progress. 

Figure  3.  Computable  repeat-loops. 

sures  that  the  progress  guaranteed  by  the  incremental  achievement  and  repeatability 
constraints  can  be  detected  and  the  goal  eventually  verified.  For  both  failure  recov¬ 
ery  and  interactions  for  incremental  progress,  if  the  repeat-loop  satisfies  the  con¬ 
straints,  the  repeat-loop  is  considered  probably  completable  and  the  goal  c  is  achiev¬ 
able. 

Completable  Plans  vs.  Universal  Plans 

A  universal  plan  [Schoppers87]  is  the  compilation  of  a  planner’s  knowledge  of 
actions  with  respect  to  a  goal  into  a  set  of  situation-action  rules  which  provide  a  sys¬ 
tem  with  advice  on  what  to  do  next  in  all  possible  situations,  and  may  thus  be  viewed 
as  the  transformation  of  a  set  of  classical  plans  into  a  purely  reactive  plan.  A  complet¬ 
able  plan  with  conditionals  is  essentially  derivable  from  a  set  of  classical  plans  as 
well — i.e.  a  completable  plan  with  a  conditional  may  be  viewed  as  a  set  of  classical 
plans,  one  for  each  alternative.  The  primary  difference  is  that  in  a  universal  plan,  all 
the  action  decision  are  made  during  execution  while  in  a  completable  plan,  only  par¬ 
ticular  planning  decisions  are  left  for  execution. 

For  a  particular  planning  problem,  let  U  be  a  universal  plan  and  C  be  a  complet¬ 
able  plan,  where  U  corresponds  to  some  complete  set  of  classical  plans  P  and  C  corre¬ 
sponds  to  some  subset  P'  of  P.  Assume  that  the  procedure  for  evaluating  both  a  uni¬ 
versal  plan  and  a  conditional  takes  as  input  a  set  of  rules  and  outputs  one  as  the  rule 
to  be  applied.  Let: 
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r  =  the  average  number  of  action-decision  rules  in  a  conditional  P 
u  =  the  number  of  situation-action  rules  in  U 
q  =  the  average  cost  of  evaluating  a  rule 
Then  rcr  is  the  cost  of  evaluating  a  conditional  and  uq  is  the  cost  of  evaluating  the 
situation-action  rules.  In  the  worst  case,  C  will  itself  be  equivalent  to  a  universal  plan 
for  F,  with  every  action  determined  by  a  conditional  containing  all  the  situation-ac¬ 
tion  rules  for  P',  but  since  F  £  P ,rq  <  uq.  Furthermore,  let: 

n  =  the  average  number  of  actions  taken  using  U  to  achieve  the  goal 
d  =  the  number  of  deferred  decisions — i.e.  conditionals — in  C. 

Since  every  action  decision  in  U  is  made  by  evaluating  the  rules  of  U,  the  total  run¬ 
time  evaluation  cost  of  U  is  nuq,  while  that  of  P  is  drq.  Since  in  P  some  action  deci¬ 
sions  will  generally  be  made  a  priori,  d  will  generally  be  less  than  n,  and  thus,  drq 
<  nuq.  Intuitively,  it  seems  the  size  r  of  a  conditional  would  be  much  less  than  the 
size  u  of  the  universal  plan,  since  a  conditional  is  a  very  restricted  set  of  action-deci¬ 
sion  rules  for  achieving  a  particular  goal,  whereas  a  universal  plan  encompasses  all 
the  intermediate  goals  and  states,  and  thus  probably  drq  «  nuq.  For  the  same  rea¬ 
son,  however,  U  is  much  more  flexible  than  C.  Furthermore,  since  U  does  not  check 
any  preconditions  prior  to  execution,  if  we  let 

a  =  the  number  of  (a  priori)  preconditions  in  P 
cp  =  the  average  cost  of  verifying  a  precondition 
then  C  incurs  an  additional  cost  acp  over  U  for  verifying  plan  applicability.  However, 
this  cost  comes  with  the  benefit  of  a  completable  planner  being  able  to  determine, 
before  any  actions  are  executed,  whether  the  plan  will  achieve  the  goal.  In  the  imper- 
fectly-characterizable  but  fairly  well-behaved  domains  for  which  completable  plan¬ 
ning  is  designed,  trading  off  flexibility  for  guarantees  of  success  is  probably  an  ad¬ 
vantageous  decision.  Current  work  includes  designing  experiments  to  investigate 
this  tradeoff  as  well  as  the  actual  costs  incurred,  and  the  relative  benefits  and  disad¬ 
vantages  brought  by  completable  planning. 

LEARNING  PROBABLY  COMPLETABLE  PLANS 
The  planning  problem  has  provided  a  wealth  of  research  opportunities  for  the 
learning  community,  as  evidenced  by  work  such  as  [Chien89,  DeJong89,  Ham- 
mond86,  Minton85,  Mitchell90,  Mooney88].  In  [Gervasio90],  explanation-based 
learning  [DeJong86,  Mitchell86]  was  shown  to  be  useful  in  learning  completable 
plans  which  involved  the  deferment  of  determining  the  length  of  time  over  which  to 
let  a  process  run  to  achieve  a  particular  value  for  a  continuously-changing  quantity. 
Such  a  deferred  decision  may  be  represented  in  the  formalization  presented  in  the 
previous  section  by  a  repeat-loop  iterating  over  a  wait  or  a  no-op,  with  the  exit  condi¬ 
tion  c  testing  for  the  goal  value.  Repeat-loops  for  incremental  progress  can  thus  be 
learned  by  constructing  explanations  about  how  the  general  behavior  of  repeated  ac¬ 
tions  guarantees  incremental  progress  towards  the  goal.  Here,  we  present  an  incre¬ 
mental  strategy  for  learning  conditionals  and  repeat-loops  for  simple  failure-recov¬ 
ery  in  probably  completable  plans. 
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The  idea  of  probably  completable  plans  lends  itself  naturally  to  incremental 
learning  strategies.  Conditionals,  for  example,  represent  a  partitioning  of  a  set  of 
states  into  subsets  requiring  different  actions  to  achieve  the  same  goal.  With  prob¬ 
able  achievability,  a  plan  may  include  only  some  of  these  subsets.  As  problems  in¬ 
volving  the  excluded  subsets  are  encountered,  however,  the  plan  can  be  modified  to 
include  the  new  conditions  and  actions.  Similarly,  incremental  learning  can  be  used 
to  learn  failure-recovery  strategies  within  repeat-loops.  The  motivation  behind  the 
incremental  learning  of  reactive  components  is  similar  to  the  motivation  behind 
much  work  on  approximations  and  learning  from  failure,  including  [Bennett90, 
Chien89,  Hammond86,  Mostow87,  Tadepalli89].  The  primary  difference  between 
these  approaches  and  completable  planning  is  that  in  these  approaches,  a  system  has 
the  ability  to  correct  the  assumptions  behind  its  incorrect  approximations  and  thus 
tends  to  converge  upon  a  single  correct  solution  for  a  problem.  In  completable  plan¬ 
ning,  uncertainty  is  inherent  in  the  knowledge  representation  itself  and  the  system 
instead  addresses  the  problem  of  ambiguity  through  reactivity.  As  a  system  learns 
improved  reactive  components,  it  thus  tends  to  increase  a  plan’s  coverage  of  the  pos¬ 
sible  states  which  may  be  reached  during  execution. 

Learning  Probably  Completable  Conditionals 

Since  preconditions  for  the  actions  in  an  action-decision  rule  may  be  satisfied 
either  through  initial  state  information  or  through  the  conditions  in  the  rule,  a  general 
plan  learned  through  EBL2  must  be  further  processed  to  distinguish  between  these 
two  types  of  preconditions: 

For  each  precondition  pr 
Ifpr  is  not  satisfied  by  I 
thenlfpr  is  observable 

then  Find  all  operators  supported  by  pr 

Make  the  execution  of  that  operator  conditional  on  pr 
Remove  pr  from  the  general  plan’s  preconditions ? 

Recall  that  for  conditionals  to  be  completable,  they  must  satisfy  the  constraints 
of  exhaustiveness,  observability,  and  achievement.  Since  the  plans  here  are  derived 
from  explanations,  the  constraint  of  achievement  is  already  satisfied.  The  procedure 
above  checks  for  observability.  For  the  exhaustiveness  constraint,  let  X  be  the  de¬ 
sired  minimum  coverage,  where  X  can  be  a  user-supplied  value  or  one  computed 
from  other  parameters  such  as  available  resources  and  importance  of  success.  Cover¬ 
age  can  be  represented  by  probabilities,  either  qualitative  or  quantitative,  in  our  case, 
qualitative  (i.e.  using  qualitative  terms  such  as  ’’usually”  to  denote  high  probability). 
Then  the  exhaustiveness  constraint  is  satisfied  in  a  conditional  {COND  ci  — »  qi;  ... 
;cn  —>  qn>  iff  the  probability  of  (qvc2v...vcn)  is  at  least  X. 

2.  EBL  is  used  in  completable  planning  to  learn  macro-operators  or  general  plans.  The  planner  then  simply 
looks  for  a  single  applicable  general  plan  when  given  a  planning  problem.  No  chaining  on  macro-operators 
is  performed. 

3.  The  distinction  between  initial  state  information  and  runtime  information  is  an  important  one  in  complet¬ 
able  planning,  and  it  is  assumed  that  the  learning  system  keeps  track  of  or  is  able  to  reason  about  what  informa¬ 
tion  was  available  initially  and  what  came  in  during  execution. 
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A  conditional  manifests  itself  in  an  explanation  a  multiple,  disjunctive  paths  be¬ 
tween  two  nodes  (Figure  4a),  with  a  path  representing  one  action-decision  rule,  its 
leaves  which  cannot  be  satisfied  in  the  initial  state  forming  the  condition  and  the  op¬ 
erators  along  the  path  forming  the  action.4  Since  coverage  may  be  incomplete,  a  sys¬ 
tem  may  at  one  time  fail  to  satisfy  any  of  the  conditions  within  a  conditional,  in  which 
case,  the  system  has  the  option  of  learning  a  new  alternative  (Figure  4b)  to  solve  the 
current  problem  and  to  increase  coverage  in  future  problems  (Figure  4c).  Merging 

a.  old  conditional  b.  new  alternative  c.  new  conditional 

Figure  4.  Explanation  Structures  in  learning  new  conditionals. 

a  new  rule  into  a  conditional  can  be  done  using  the  old  plan  and  a  plan  with  the  new 
alternative  as  follows:5 

new-to-add  :=  plan  components  in  new  plan  not  matching  any  in  old  plan 
old-to-change  :=  plan  component  in  old  plan  not  matching  any  in  new  plan 
Make  a  new  action-decision  rule  using  new-to-add 
Append  the  new  rule  to  the  action-decision  rules  of  old-to-change 
For  each  precondition  pr  in  the  new  plan 
Ifpr  is  not  already  in  the  old  plan 
then  add  pr  to  the  preconditions  of  the  old  plan. 

Learning  Probably  Computable  Repeat-Loops 

Repeat-loops  for  simple  failure-recovery  address  the  problem  of  actions  with 
nondeterministic  effects  or  multiple  possible  outcomes,  and  thus  repeat-loops  are 
first  constructed  by  identifying  such  actions  in  the  general  plan: 

For  each  action  a  in  the  plan 

If  the  outcome  of  a  used  in  the  plan  is  a  probable  outcome  among  others 
thenlfthe  desired  outcome  c  is  observable 
then  Construct  a  repeat  loop  for  a. 

Recall  that  for  a  repeat-loop  for  failure  to  be  completable,  it  must  satisfy  the  con¬ 
straint  of  repeatability  aside  from  the  constraints  of  observability  and  achievement 
If  the  unsuccessful  outcomes  of  a  do  not  prevent  the  repetition  of  a,  then  the  repeat¬ 
ability  constraint  is  satisfied,  and  the  probable  eventual  achievement  of  the  desired 
effects  is  guaranteed.  However,  for  unsuccessful  outcomes  which  deny  the  precondi¬ 
tions  to  a,  actions  to  recover  the  preconditions  must  be  learned.  These  precondition- 
recovery  strategies  within  a  repeat-loop  can  be  characterized  as  a  conditional,  where 

4.  Initially,  a  plan  may  contain  a  conditional  containing  only  one  action-decision  rule,  a  case  arising  when 
there  is  only  one  known  action  but  some  of  whose  actions  need  to  be  verified  at  execution. 

5.  Plans  (their  explanations)  are  associated  with  specific  and  general  bindings  (see  [Mooney86]  for  more  on 
bindings).  This  procedure  uses  the  specific  bindings  of  both  plans  to  determine  equality  and  as  equal  compo¬ 
nents  and  preconditions  are  found,  tire  combined  general  bindings  are  updated  to  effectively  achieve  the  merg¬ 
ing,  with  the  final  general  bindings  are  used  in  the  modified  old  plan. 
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the  different  states  are  the  different  outcomes,  the  different  actions  are  the  different 
precondition-recovery  strategies,  and  the  common  effect  state  is  the  precondition 
state  of  the  action  a.  If  we  let  Ui  be  an  unsuccessful  outcome,  and  ri  be  the  recovery 
strategy  for  Ui,  then  a  repeat-loop  eventually  takes  the  form  {REPEAT  <q;  [COND 
ui  —>  rf, ... ;  un  — » rj  >  UNTIL  c>.  Learning  the  embedded  conditional  for  failure 
recovery  can  be  done  as  in  the  previous  section. 

Example 

A  completable  planning  system  implemented  in  Common  LISP  on  an  IBM  RT 
Model  125  was  given  the  task  of  learning  a  plan  to  get  from  one  small  city  to  another 
going  through  two  larger  cities  using  a  train.  The  primary  source  of  incompleteness 
preventing  complete  a  priori  planning  is  the  system’s  knowledge  with  regard  to  the 
state  of  the  railroads.  In  order  for  a  system  to  get  from  one  city  to  another,  the  cities 
have  to  be  connected  by  a  railroad,  and  the  railroad  has  to  be  clear.  For  a  railroau 
to  be  considered  clear,  it  must  be  not  flooded,  not  be  congested  with  traffic,  be  free 
of  accidents,  and  not  be  under  construction.  These  conditions  cannot  always  be  veri¬ 
fied  a  priori  for  all  railroads,  hence  the  need  for  conditionals. 

The  training  example  involves  getting  from  the  city  of  Wyne  to  the  city  of  Ruraly, 
where  the  rail  connectivity  between  the  two  cities  is  shown  in  Figure  5.  Here,  the  rail¬ 
road  A-B  is  a  major  railroad  and  sometimes  gets  congested.  Also,  the  northern  rail¬ 
roads  to  and  from  X,  C,  and  Z  are  susceptible  to  flooding.  And  accidents  and  con¬ 
struction  may  occur  from  time  to  time. 


T 

Figure  5.  Rail  connectivity  between  Wyne  and  Ruraly. 


Learning  Initial  Plan.  The  initial  training  example  given  to  the  system  is  the  route 
Wyne-A-B-Ruraly,  since  this  is  generally  the  quickest  way  to  get  from  Wyne  to  Ru¬ 
raly.6 7  The  initial  derived  plan  is  processed  to  determine  conditionals  and  the  learned 
general  plan  is  shown  in  Figure  6.  This  is  a  plan  for  getting  from  one  city  to  another 


6.  In  the  learning  instances  presented  here,  only  the  Wyne-to-Ruraly  problem  is  used.  This  is  partly  to  sim¬ 
plify  the  presentation  and  partly  to  avoid  confounding  the  problem  of  learning  conditionals  with  the  generaliza- 
tion-to-N  problem  [Cohen88,  Shavlik87],  of  which  route-planning  between  several  cities  can  be  seen  as  an 
instance.  One  drawback  of  this  decision  is  that  the  issue  of  evaluating  conditionals  can  be  sidestepped  for  the 
meantime;  by  presenting  the  system  with  alternatives  in  the  order  of  their  desirability,  the  action-decision  rules 
in  the  resulting  conditional  can  simply  be  evaluated  in  sequence  until  one  applies.  A  possible  non-implementa¬ 
tion  solution  would  be  to  let  all  the  action-decision  rules  fire  and  then  to  use  some  metric  (for  example,  route- 
length)  to  decide  between  the  applicable  alternatives. 

7.  The  optimization  problem  is  a  very  interesting  research  problem  in  itself,  but  beyond  the  scope  of  the  cur¬ 
rent  research.  Here,  we  assume  that  the  expert  provides  already  optimized  solutions  as  training  examples. 
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PLAN  I 
[COMPS 

[COND  ((NOT  (ACC  7RT22692))  (NOT  (CONSTR  7RT22692))) 

->  ((GO  7AGT22688  7CITY 122689  7CITY222690  7RT22692)) 

[COND  ((NOT  (ACC  7RT22697))  (NOT  (CONSTR  7RT22697))  (NOT  (TRAFF  7RT22697))) 

->  ((GO  7AGT22688  7OTY222690  7CITY222695  7RT22697)) 

[COND  ((NOT  (ACC  7RT22702))  (NOT  (CONSTR  7RT22702))) 

->  ((GO  7AGT22688  7CITY222695  7CITY 222700  7RT22702)) 

[PRECS  (AT  7AGT22688  70TY122689)  (CONN  7CITY122689  7CITY 222690  7RT22692)  (NOT  (TRAFF  7RT22692)) 
(NOT  (FLOOD  7RT22692))  (CONN  7C1TY222690  7CITY222695  7RT22697)  (NOT  (FLOOD  7RT22697))  (CONN 
7CJTY222695  7CITY222700  7RT22702)  (NOT  (TRAFF  7RT22702))  (NOT  (FLOOD  7RT22702)) 

[EFFS  (AT  7AGT22688  7CTTY222700)] 

[EXPL:  [EXPLANATION  for  (AT  AMATRAK  RURALY)]] 

Figure  6.  Initial  Learned  Plan.8 

with  two  intermediate  stops,  where  only  the  railroad  between  the  two  intermediate 
cities  is  susceptible  to  heavy  traffic  and  must  thus  be  checked  for  it. 

Learning  Alternative  Plans.  When  the  system  encounters  a  situation  in  which  none 
of  the  conditions  in  a  conditional  is  satisfied — in  this  example,  (not  (has-heavy-traf- 
fic  A-B))  proves  false  just  as  the  system  is  to  execute  (go  Amatrak  A  B  A-B)  to 
achieve  (at  Amatrak  B) — the  system  needs  to  leam  a  new  route  from  A  to  B  in  order 
to  get  back  on  track.  The  solution  given  to  the  system  is  the  route  A-C-B,  which  gets 
the  system  to  B  and  allows  it  to  continue  with  the  next  step  in  its  original  plan  and 
achieve  its  goal  of  getting  to  Ruraly.  From  this  experience,  the  system  modifies  its 
old  plan  to  include  the  new  alternative  of  going  through  another  city  between  the  two 
intermediate  cities.  The  system  thus  now  has  two  alternatives  when  it  gets  to  city 
A.  When  it  encounters  a  situation  in  which  A-B  is  congested  and  A-C  is  flooded, 
it  is  given  another  alternative  solution,  A-D-E-B,  from  which  it  learns  another  plan 
to  get  from  A  to  B  and  modifies  the  old  plan  as  before.  Now,  in  planning  to  get  from 
Wyne  to  Ruraly,  the  system  is  able  to  construct  the  plan  in  Figure  7. 

PLAN  1 
[COMPS 

[COND  ((NOT  (ACC  WYNE-A))  (NOT  (CONSTR  WYNE-A)))  ->  ((GO  AMATRAK  WYNE  A  WYNE-A))] 

[COND  ((NOT  (ACC  A-B))  (NOT  (CONSTR  A-B))  (NOT  (TRAFF  A-B))  ->  ((GO  AMATRAK  A  B  A-B)) 

((NOT  (ACC A-C))  (NOT  (CONSTR  A-C))  (NOT (FLOOD  A-C))) 

->  (((GO  AMATRAK  A  C  A-C)) 

(COND  (((NOT  (ACC  C-B))  (NOT  (CONSTR  C-B))  (NOT  (FLOOD  C-B))) 

->  ((GO  AMATRAK  C  B  C-B))))) 

((wrr(*ccA-Q))  cHcrucotKsnxA-m 

->  (((qOAMATRAXA  PA-P)) 

(amp  (men (acc p-p)) p&TicaHST’K'D-V)) ->  ((go amatrax p  p  p-p))» 

(amp  (mar(AccT.-'S))  cxariamsmp-^)))  ->  ((goxMATWKT.  v  p-W))l 

[COND  ((NOT  (ACC  B-RURALY))  (NOT  (CONSTR  B-RURALY))) 

->  ((GO  AMATRAK  B  RURALY  B-RURALY))] 

[PRECS  (AT  AMATRAK  WYNE)  (CONN  WYNE  A  WYNE-A)  (NOT  (TRAFF  WYNE-A))  (NOT  (FLOOD 
WYNE-A))  (CONN  A  B  A-B)  (NOT  (FLOOD  A-B))  (CONN  B  RURALY  B-RURALY)  (NOT  (TRAFF 
B-RURALY))  (NOT  (FLOOD  B-RURALY))  (CONN  A  C  A-C)  (NOT  (TRAFF  A-C))  (CONN  C  B  C-B)  (NOT 
(TRAFF  C-B))  (amK.A'DA-P)  CXpT  (TXA77  A-P))  (N{rT  (7 COOP  A  -  V))  (CONKP  P  P-P)  CXOT  (TXP77  P-P))  (NpT 
(7 LOOP  P-P!)  (comp  t  P-P)  CXprtrxjK77P-'B))  (?(OT(7LJOOP  P-t)) 

[EFFS  (AT  AMATRAK  RURALY)] 

[EXPL;  [EXPLANATION  for  (AT  AMATRAK  RURALY)]] 

Figure  7.  Final  specific  plan  for  getting  to  Ruraly  from  Wyne.9 


8.  For  brevity,  the  following  abbreviations  have  been  used:  conn  for  connected,  acc  for  has-aceident,  constr 
for  under-construction,  traff  for  has-heavy-traffic,  and  flood  for  flooded. 

9.  Portions  added  by  new  plans  shown  in  two  sets  of  italics. 
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DISCUSSION  &  CONCLUSIONS 

Note  that  there  are  usually  many  possible  alternative  plans — in  the  railroad-rou¬ 
te-planning  problem,  for  example,  there  are  as  many  train  routes  as  there  are  rail¬ 
roads  between  any  two  cities.  Unless  there  is  reason  to  learn  new  alternatives,  how¬ 
ever,  effort  will  not  be  expended  in  learning  these  alternatives.  This  minimizes  the 
execution-time  cost  of  evaluating  conditionals  by  keeping  conditionals  small,  as 
well  as  the  cost  of  checking  preconditions,  since  new  action-decision  rules  also  usu¬ 
ally  add  preconditions  to  be  checked  for  plan  applicability. 

A  direction  for  future  work  is  a  more  thorough  analysis  of  the  tradeoff  between 
the  advantages  brought  and  costs  incurred  by  completable  planning.  Aside  from  the 
a  priori  planning  cost  completable  plans  have  over  reactive  plans,  and  the  runtime 
evaluation  cost  completable  plans  have  over  classical  plans,  in  proving  achievability 
completable  plans  also  sometimes  require  knowledge  about  the  general  behavior  of 
actions  not  always  available  in  traditional  action  definitions.  On  the  other  hand, 
completable  planning  also  minimizes  a  priori  information  requirements.  Another  di¬ 
rection  for  future  work  is  in  integrating  probabilities  more  fully  into  the  completable 
planning  framework.  This  would  involve  using  qualitative  probabilities  (as  in  [Wel- 
lman90])  or  quantitative  probabilities  (as  in  [Hanks90])  and  characterizing  their  rela¬ 
tionship  to  achievability,  which  may  help  quantify  the  notion  of  achievability  as  well 
as  open  up  new  areas  in.which  to  explore  learning. 

[Martin90]  presents  a  planning  approach  wherein  an  a  priori  strategic  planner 
defers  to  the  reactive  planner  those  planning  decisions  the  reactive  planner  has  prov¬ 
en  (through  experience)  itself  capable  of  handling.  In  contrast,  the  achievability  cri¬ 
terion  used  in  our  work  is  knowledge-based,  rather  than  empirical,  although  a  com¬ 
bination  of  both  is  currently  being  investigated.  The  conditionals  in  this  work  are 
also  related  to  the  work  on  disjunctive  plans,  such  as  [Fox,  Mello86],  however  these 
have  been  focused  more  towards  the  construction  of  complete,  flexible  plans  for  clo¬ 
sed-world  manufacturing  applications,  whereas  the  incremental  learning  strategy 
presented  here  was  designed  precisely  for  problems  where  accounting  for  all  contin¬ 
gencies  is  expected  to  be  intractable.  The  idea  of  incrementally  improving  a  plan’s 
coverage  has  also  been  investigated  in  [Drummond90],  where  a  plan’s  chance  of 
achieving  the  goal  is  increased  through  robustification,  the  gradual  consideration  of 
other  possible  outcomes  of  actions  and  construction  of  failure-recovery  strategies 
for  them.  Here,  aside  actions  with  different  possible  outcomes,  we  deal  with  the 
problem  of  over-general  knowledge.  And  as  discussed  earlier,  there  is  also  much 
related  work  on  learning  good  approximations  in  planning,  including  [Bennett90, 
Chien89,  Hammond86,  Mostow87,  Tadepalli89]. 

In  this  paper,  we  have  extended  the  idea  of  completable  planning  by  allowing  an 
a  priori  planner  to  defer  goals  which  are  only  probably  achievable.  We  defined  com¬ 
pletable  planning  in  terms  of  its  plan  components,  as  well  as  the  achievability  con¬ 
straints  on  conditionals  and  repeat-loops,  and  we  discussed  various  cost  tradeoffs. 
Finally,  we  presented  and  demonstrated  incremental  strategies  for  learning  condi¬ 
tionals  and  repeat-loops  in  completable  plans. 
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